智能论文笔记

Ensemble random forest filter: An alternative to the ensemble Kalman filter for inverse modeling

Vanessa A. Godoy , Gian F. Napa-García , J. Jaime Gómez-Hernández

分类：机器学习

2022-07-08

出现集合随机滤清器（ERFF）作为逆建模的替代品的替代卡尔曼滤波器（ENKF）。 ENKF是一种数据同化方法，随着观察结果的收集，可以依次依次估算参数估计参数。更新步骤是基于从实现集合中计算出的实验协方差，并将更新作为线性组合，是观测值和预测的系统状态值之间差异的线性组合。 ERFF用随机森林表示的非线性函数代替更新步骤中的线性组合。这样，可以捕获要更新的参数与观察值之间的非线性关系，并产生更好的更新。在许多方案中，有不同程度的异质性（对数电导率变异从1到6.25（ln m/d）2），在许多方案中，证明了ERFF的对数指导性识别的目的。合奏（50或100），以及打击头观测的数量（18或36）。在所有情况下，ERFF效果很好，能够重建对数传导性空间异质性，同时匹配所选控制点处观察到的压电头。为了进行基准测试，将ERFF与重新启动ENKF进行了比较，以发现ERFF在使用的集合实现的数量（在典型的ENKF应用中很小）中优于ENKF。只有当实现的数量增加到500时，重新启动ENKF才能匹配ERFF的性能，尽管计算成本三倍。

translated by 谷歌翻译

A Machine Learning Enhanced Approach for Automated Sunquake Detection in Acoustic Emission Maps

Vanessa Mercea , Alin Razvan Paraschiv , Daniela Adriana Lacatus , Anca Marginean , Diana Besliu-Ionescu

分类：计算机视觉 | 机器学习

2022-12-13

Sunquakes are seismic emissions visible on the solar surface, associated with some solar flares. Although discovered in 1998, they have only recently become a more commonly detected phenomenon. Despite the availability of several manual detection guidelines, to our knowledge, the astrophysical data produced for sunquakes is new to the field of Machine Learning. Detecting sunquakes is a daunting task for human operators and this work aims to ease and, if possible, to improve their detection. Thus, we introduce a dataset constructed from acoustic egression-power maps of solar active regions obtained for Solar Cycles 23 and 24 using the holography method. We then present a pedagogical approach to the application of machine learning representation methods for sunquake detection using AutoEncoders, Contrastive Learning, Object Detection and recurrent techniques, which we enhance by introducing several custom domain-specific data augmentation transformations. We address the main challenges of the automated sunquake detection task, namely the very high noise patterns in and outside the active region shadow and the extreme class imbalance given by the limited number of frames that present sunquake signatures. With our trained models, we find temporal and spatial locations of peculiar acoustic emission and qualitatively associate them to eruptive and high energy emission. While noting that these models are still in a prototype stage and there is much room for improvement in metrics and bias levels, we hypothesize that their agreement on example use cases has the potential to enable detection of weak solar acoustic manifestations.

translated by 谷歌翻译

A New Aligned Simple German Corpus

Vanessa Toborek , Moritz Busch , Malte Boßert , Pascal Welke , Christian Bauckhage

分类：自然语言处理

2022-09-02

与简单英语的德国同行“莱希特·斯普拉奇（Leichte Sprache）”是一种旨在促进复杂的书面语言的受监管语言，否则不同的人群将无法访问。我们为简单德语 - 德语提供了一个新的与句子一致的单语语料库。它包含多个使用自动句子对准方法对齐的文档对准源。我们根据手动标记的对齐文档子集评估我们的对齐方式。通过F1得分衡量的句子对齐质量超过了先前的工作。我们根据CC BY-SA和MIT许可证的随附代码发布数据集。

translated by 谷歌翻译

A Probabilistic Autoencoder for Type Ia Supernovae Spectral Time Series

George Stein , Uros Seljak , Vanessa Bohm , G. Aldering , P. Antilogus , C. Aragon , S. Bailey , C. Baltay , S. Bongard , K. Boone

分类：机器学习

2022-07-15

我们从一组稀疏的光谱时间序列中构建了一个物理参数化的概率自动编码器（PAE），以学习IA型超新星（SNE IA）的内在多样性。 PAE是一个两阶段的生成模型，由自动编码器（AE）组成，该模型在使用归一化流（NF）训练后概率地解释。我们证明，PAE学习了一个低维的潜在空间，该空间可捕获人口内存在的非线性特征范围，并且可以直接从数据直接从数据中准确地对整个波长和观察时间进行精确模拟SNE IA的光谱演化。通过引入相关性惩罚项和多阶段训练设置以及我们的物理参数化网络，我们表明可以在训练期间分离内在和外在的可变性模式，从而消除了需要进行额外标准化的其他模型。然后，我们在SNE IA的许多下游任务中使用PAE进行越来越精确的宇宙学分析，包括自动检测SN Outliers，与数据分布一致的样本的产生以及在存在噪音和不完整数据的情况下解决逆问题限制宇宙距离测量。我们发现，与以前的研究相一致的最佳固有模型参数数量似乎是三个，并表明我们可以用$ 0.091 \ pm 0.010 $ mag标准化SNE IA的测试样本，该样本对应于$ 0.074 \ pm。 0.010 $ mag如果删除了特殊的速度贡献。训练有素的模型和代码在\ href {https://github.com/georgestein/supaernova} {github.com/georgestein/supaernova}上发布

translated by 谷歌翻译

A Transfer Learning Pipeline for Educational Resource Discovery with Application in Leading Paragraph Generation

Irene Li , Thomas George , Alexander Fabbri , Tammy Liao , Benjamin Chen , Rina Kawamura , Richard Zhou , Vanessa Yan , Swapnil Hingmire , Dragomir Radev

分类：自然语言处理 | 人工智能

2022-01-07

有效的人类学习取决于广泛的教育材料，与学习者目前对该主题保持一致。虽然互联网彻底改变了人类的学习或教育，但仍存在大量资源可访问性障碍。即，过剩的在线信息可以使其充满努力导航和发现高质量的学习材料。在本文中，我们提出了教育资源发现（ERD）管道，用于为新颖域自动化Web资源发现。管道由三个主要步骤组成：数据收集，功能提取和资源分类。我们从一个已知的源域开始，通过传输学习在两个看不见的目标域上进行资源发现。我们首先从一组种子文档中收集频繁查询并在网上搜索以获取候选资源，例如讲座幻灯片和介绍博客帖子。然后我们介绍一个小说预用信息检索深神经网络模型，查询文件屏蔽语言建模（QD-MLM），以提取这些候选资源的深度特征。我们应用基于树的分类器来决定候选人是否是一个积极的学习资源。当在两个类似但新的靶域评估时，管道在评估时实现0.94和0.82的F1分数。最后，我们展示了该管道如何使应用程序有益于应用：调查的领先段落生成。这是据我们所知，这是考虑各种网络资源的研究。我们还释放了39,728个手动标记的Web资源的语料库，以及来自NLP，计算机视觉（CV）和统计信息（统计数据）的659个查询。

translated by 谷歌翻译

CLICKER: A Computational LInguistics Classification Scheme for Educational Resources

Swapnil Hingmire , Irene Li , Rena Kawamura , Benjamin Chen , Alexander Fabbri , Xiangru Tang , Yixin Liu , Thomas George , Tammy Liao , Wai Pan Wong

分类：自然语言处理

2021-12-16

科学主题的分类方案概述了其知识体系。它还可以用于促进访问研究文章和与受试者相关的其他材料。例如，ACM计算分类系统（CCS）用于ACM数字库搜索界面以及索引计算机科学论文。我们观察到，计算语言学（CL）和自然语言处理（NLP），不存在综合分类系统等CCS或数学主题分类（MSC）。我们提出了一个分类方案 - 基于在这一主题的77个大学课程的在线讲座的分析，Cl / NLP的Clicker。目前拟议的分类学包括334个主题，并侧重于CL / NLP的教育方面;它主要是基于，但不是完全，在NLP课程的讲义中。我们讨论这种分类系统如何帮助各种现实世界应用，包括辅导平台，资源检索，资源推荐，先决条件链学习和调查生成。

translated by 谷歌翻译

Novel Features for Time Series Analysis: A Complex Networks Approach

Vanessa Freitas Silva , Maria Eduarda Silva , Pedro Ribeiro , Fernando Silva

分类：机器学习

2021-10-11

能够捕获与特征向量的时间序列的特征是具有多种应用的非常重要的任务，例如分类，聚类或预测。通常，该特征是从线性和非线性时间序列测量获得的特征，其可能存在若干数据相关的缺点。在这项工作中，我们将NetF介绍作为替代特征，包括时间序列的不同复杂网络映射的几种代表性拓扑测量。我们的方法不需要数据预处理，并且无论任何数据特征如何，都适用。探索我们的新颖特征向量，我们能够将映射的网络功能连接到多样化的时间序列模型中固有的属性，显示NetF可以有用的时间数据。此外，我们还展示了我们在聚类合成和基准时间序列组中的方法的适用性，比较其具有更多传统功能的性能，展示了Netf如何实现高精度集群。我们的结果非常有前途，具有来自不同映射方法的网络特征，捕获时间序列的不同属性，将不同且丰富的功能设置为文献。

translated by 谷歌翻译

A conditional one-output likelihood formulation for multitask Gaussian processes

Óscar García-Hinde , Vanessa Gómez-Verdejo , Manel Martínez-Ramón

分类：机器学习 | (统计)机器学习

2020-06-05

多任务高斯流程（MTGP）是高斯流程（GP）框架的多输出回归问题的解决方案，其中在观察值的情况下，回归器的$ T $元素不能被认为是有条件独立的。标准MTGP模型假设同时存在多任务协方差矩阵，该矩阵是插入式矩阵的函数和噪声协方差矩阵。这些矩阵需要通过订单$ p $的低级简化来近似，以减少从$ t^2 $到$ tp $学习的参数数量。在这里，我们介绍了一种新颖的方法，该方法通过将其减少到一组条件的单变量GP来简化了多任务学习，而无需任何低级近似值，因此完全消除了为超参数$ p $选择足够值的要求。同时，通过使用层次结构和近似模型扩展此方法，提出的扩展可以在仅学习$ 2T $参数后能够恢复多任务协方差和噪声矩阵，从而避免对任何模型超参数的验证并减少整体的验证模型的复杂性以及过度拟合的风险。关于合成和实际问题的实验结果证实了这种推论方法在其准确恢复原始噪声和信号矩阵的能力方面的优势，以及与其他最先进的MTGP方法相比，实现的性能提高。我们还将该模型与标准GP工具箱集成在一起，表明它具有与最先进的选项的计算竞争。

translated by 谷歌翻译

A Tutorial on Parametric Variational Inference

Jens Sjölund

分类： (统计)机器学习 | 机器学习

2023-01-03

Variational inference uses optimization, rather than integration, to approximate the marginal likelihood, and thereby the posterior, in a Bayesian model. Thanks to advances in computational scalability made in the last decade, variational inference is now the preferred choice for many high-dimensional models and large datasets. This tutorial introduces variational inference from the parametric perspective that dominates these recent developments, in contrast to the mean-field perspective commonly found in other introductory texts.

translated by 谷歌翻译

A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge

Haodi Ma , Daisy Zhe Wang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-03

Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.

translated by 谷歌翻译